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Abstract —In this paper, we consider the problem of compres¬ 
sive sensing (CS) recovery with a prior support and the prior 
support quality information available. Different from classical 
works which exploit prior support blindly, we shall propose novel 
CS recovery algorithms to exploit the prior support adaptively 
based on the quality information. We analyze the distortion 
bound of the recovered signal from the proposed algorithm and 
we show that a better qnality prior support can lead to better CS 
recovery performance. We also show that the proposed algorithm 
would converge in O (log SNR) steps. To tolerate possible model 
mismatch, we further propose some robustness designs to combat 
incorrect prior support quality information. Finally, we apply 
the proposed framework to sparse channel estimation in massive 
MIMO systems with temporal correlation to further reduce the 
required pilot training overhead. 

I. Introduction 

The problem of recovering a sparse signal from a number of 
compressed measurements has been drawing a lot of attention 
in the research community |[T). Specifically, consider the 
following compressive sensing (CS) model; 

y = #x (1) 

where x S is the unknown sparse signal (||x||o <C N), 

$ e is the measurement matrix with M N , and 

y S jjjg measurements, where the goal is to recover 

X based on y and 4>. Since M <C V, Q is in fact an under¬ 
determined system and hence there are infinite solutions of x 
to satisfy Q in general. However, utilizing the fact that x is 
sparse (||x||o ^ N), it is possible recover x exactly via the 
following formulation ||T|: 

min||x||o s.t. y = $x. (2) 

X 

Unfortunately, problem (|^ is combinatorial and has pro¬ 
hibitive complexity 0- To have feasible solutions, researchers 
have designed many methods to approximately solve 0- 
For instance, the convex approximation approach via (i-norm 
minimization (basis pursuit) is proposed in Q. Greedy-based 
algorithms which focus on iteratively identifying the signal 
support (i.e.,T = {i : x(i) ^ 0}) or approximating the 
signal coefficients are proposed in (e.g., the orthog¬ 

onal matching pursuit (OMP) in 1^ iterative hard thresh¬ 
olding (IHT) in g), compressive sampling matching pursuit 
(CoSaMP) in Q, and subspace pursuit (SP) in |[^). By using 
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the tools of the restricted isometry property (RIP) Q, these 
CS recovery algorithms 0-0 are shown to achieve efficient 
recovery with substantially fewer measurements compared 
with the signal dimension (i.e., M <C N). Besides, there 
are also works that deploy the approximate message passing 
technique to achieve efficient CS recovery 0-0- However, 
they ||2|-Q consider one-time static CS recovery and do not 
exploit the prior information of the signal support. 

In practice, we usually encounter the problem of recover¬ 
ing a sequence of sparse signals and the sparse patterns of 
the signals are usually correlated across time. For instance, 
consecutive real time video signals GD usually have 

strong dependencies. In spectrum sensing, the index set of the 
occupied frequency band usually varies slowly G3- In sparse 
channel estimation, consecutive frames tend to share some 
multi-paths due to the slowly varying propagation environment 
between base stations and users in, m- As such, there is 
huge potential to exploit previously estimated signal support to 
enhance the CS recovery performance at the present time. In 
the literature, some works 0,115) -|T8) have already consid¬ 
ered CS problems with a prior signal support To available and 
modified CS algorithms 0,0-III) are proposed to exploit 
the prior To to enhance the performance. For instance, in GD- 

9 modified basis pursuit designs are proposed to utilize To 
minimizing the Zi-norm of the subvector x-j-o formed by 
iding the elements of x in To, Eq = {1,..., V}\To- Based 
on this, GD’GD have further considered a weighted /i-norm 
minimization approach to exploit To. However, these designs 
ig, |Tg-|Tg do not take the quality of the prior support 
information To into consideration in the problem formulation 
and fail to exploit To adaptively based on the qualitj0of To. 
In practice, the prior signal support To may contain only part 
of correct indices for the present time (e.g., practical signal 
support is temporarily correlated but is also dynamic across 
time). In cases when only a small part of the indices in To 
is correct, using the modified basis pursuit design in GD’ 
|T5)-|T7) (which fully exploits To), would lead to a even 
worse performance Gil- As such, it is desirable to exploit 
To adaptively based on how good To is for the present time. 

In this paper, we propose a more complete model regarding 
the prior signal support information. Aside from the prior 

^For instance, a typical modified U-norm minimization 0, (T5)^{§ to 
exploit the prior support To is given by: min^ || V — 

^Here, the quality of prior support 7b refers to how many indices in 7b 
are correct for the present. Please refer to Section II for the details. 
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support To, we assume that there is a metric to further indicate 
the quality of To- Based on this, we design novel algorithms to 
exploit To adaptively based on the quality indicator to achieve 
better signal recovery performance. Different from previous 
works (nl, |[Tg-|[T§ with convex relaxation approaches, we 
shall propose a greedy pursuit approacl0to achieve our target. 
To cover more application scenarios, we shall consider a 
framework with a general signal model which incorporates 
conventional block sparsity eg, @ and multiple measure¬ 
ment vector (MMV) joint sparsity models |2g-@. There are 
several technical challenges to tackle in this work: 

• Algorithm Design to Adaptively Exploit the Prior Sup¬ 
port: Note that classical CS works |Ti),|Tg-eg exploit 
prior support information To blindly. To further enhance 
the recovery performance, we shall design a novel CS 
algorithm to exploit the prior support To adaptively based 
on the metric information indicating how good To is. On 
the other hand, the proposed CS algorithm should also 
take the general signal sparsity model into consideration. 

• Performance Analysis of the Proposed Algorithm: 
Besides the algorithm design, it is also important to quan¬ 
tify the performance of the proposed novel CS recovery 
algorithms. For instance, it is desirable to analyze the 
distortion bound of the recovered signal and it is desirable 
to characterize the associated convergence speed of the 
proposed algorithm. 

• Robust Designs to Combat Model mismatch: In prac¬ 
tice, there might be occasions with mismatch in the prior 
support information model (e.g., incorrect information of 
the prior support quality). For robustness, it it is also 
desirable to have some alternative robust designs to make 
sure that the proposed scheme works efficiently even with 
model mismatch. 

In this paper, we shall address the above challenges. In 
Section II, we introduce the CS problem setup with a general 
signal sparsity model. We then present a prior support infor¬ 
mation model and introduce the metric to quantify the quality 
of the the prior support Tq. In Section III, we present the 
proposed CS algorithm to adaptively exploit the prior support 
based on the quality indicator. After that, in Section IV, we 
analyze the recovery performance of the proposed algorithm, 
and in Section V, we further propose some robust designs to 
tolerate model mismatch with incorrect prior support quality 
information. Based on these results, in Section VI, we apply 
the proposed scheme to sparse channel estimation in massive 
MIMO systems with temporal correlation, to demonstrate the 
usefulness of the proposed framework. Numerical results in 
Section VII demonstrate the performance advantages of the the 
proposed scheme over the existing state-of-the-art algorithms. 

Notations: Uppercase and lowercase boldface letters denote 
matrices and vectors, respectively. The operators j-)^, (•)*, 

^The focus of this work is on greedy pursuit based designs and the 
detailed explanations for the selection the considered algorithm is given at the 
beginning of Section III. Note that there may be other approaches to exploit the 
prior support information, such as designing from the approximate message 
passing 0 0 which innately operates on the prior information of the signal. 
A detailed investigation of other approaches is an interesting research direction 
for future works. 


{■)\ I • |, and O(-) are the transpose, conjugate, conjugate 
transpose, Moore-Penrose pseudoinverse, cardinality, and big- 
O notation operator, respectively; supp(h) is the index set of 
the non-zero entries of vector h; ||A||i?, ||A|| and ||a|| denote 
the Frobenius norm, spectrum norm of A and Euclidean norm 
of vector a, respectively. 

II. System Model 
A. Compressive Sensing Model 

Suppose we have compressed measurements Y G 
of an unknown sparse signal matrix X G given by 

Y = T>X -f N (3) 

where <1) € £Mxn ^ measurement matrix 

and N G is the measurement noise. Our target is 

to recover X based on Y and $. Before we elaborate the 
recovery algorithm, we first elaborate the considered signal 
sparsity model and the prior support information for X in the 
following sections. 


B. Signal Sparsity Model 

Many works have considered CS problems with joint spar¬ 
sity structures in the literature. For instance, block sparsity 
is considered in pO) in which the target sparse vector 
(i.e., L = 1 in Q) has simultaneously zero or non-zero blocks 
with block size d. On the other hand, the MMV problem is 
discussed in pT)-|[^ where the target sparse matrix (L > 1) 
has simultaneously zero or non-zero rows. By exploiting the 
joint sparsity structures, better recovery performance can be 
achieved compared with conventional CS algorithms ng- 
| [2g . Motivated by these works, we shall consider a general 
sparsity model for X in ([^1 so that conventional block sparsity 
or MMV sparsity structure can be incorporated. Suppose the 
sparse matrix X G (N = Kd ) is a concatenation 

of K chunks where each chunk is of size d x L and has 
simultaneously zero or non-zero entries. Denote X[fc] G 


as its fc-th chunk of X as 

in Figure 


■ X[l] 

e - 


X[2] 

e C'^xi 

X = 



_ X[K] 

e C'^xi 


e C 


NxL 


(4) 


Define the chunk support T (with chunk size d x L —assumed 
throughout the paper) of X as 


r={n: ||X[n]||^> 0,1 <n<iT}. 


(5) 


We formally have the following definition of chunk-sparse 
matrices. 

Definition 1 (Chunk Sparsity Level): Matrix X G is 

said to have s-th chunk sparsity level (CSL) if the chunk 
support T of X as in (jg satisfies \T\ = s K. ■ 

Note that when d = 1 and L > 1, the considered signal 
model is reduced to the MMV joint sparsity models m- 
|23|; when L = 1, the considered signal model is reduced 
to the block sparsity scenarios |19|, p0| ; and when both 
d = 1 and L = 1, the considered model degenerates to the 
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Figure 1. Illustration of the available prior support and quality information 
(To, Sc) for X. Note that 7b (available) and T (unknown) denote the prior 
support information and the current signal support, respectively. Our target 
is to exploit the side information of (To, Sc) to improve the CS recovery 
performance of X from its compressed measurements Y. 


classical signal sparsity model without structures |[T). As such, 
the considered sparse signal model incorporates conventional 
sparse signal models 03-0 and potentially covers more 
application scenarios. Note that practical signals X may have 
some joint sparsity (e.g., due to physical collocation |24| 
or specific application features such as Magnetic resonance 
imaging p5| ) and using a proper signal model enables us 
to exploit the joint sparsity to enhance the signal recovery 
performance (as demonstrated in p9)~IP3)- In this paper, we 
assume that the CSL of the target signal X G is upper 

bounded by s, i.e., \T\ < s and s is available, as in classical 
CS works 0 ^ ©■ 



subvector formed by collecting the 
entries of x indexed by T. 

XlT 

submatrix formed by collecting the 
chunks of X indexed by 'T. 

$7" 

submatrix formed by collecting the 
columns of $ indexed by T. 

'*’[71 

submatrices formed by collecting columns 
of $ indexed by {(k — l)d -h 1, kd :\fk ^ T}. 


Table I 
Notations. 


III. Algorithm Design to Exploit the Prior 
Support and Quality Information 

In this section, we shall propose a novel CS recovery 
algorithm to solve Challenge 1 by extending conventional 
greedy pursuit algorithms with exploitation of (To, Sc) and 
adaptation to the chunk sparsity structure of X. Specifically, 
we select to design from SP 0 from the set of greedy-based 
CS recovery algorithms, because SP 0 possesses many good 
properties such as uniform recovery guarantee 0 , relatively 
smaller required RIP constant compared with other schemes 
of CoSaMP 0 or IHT Q (based on the so far best known 
RIP constants for these schemes 0 , 0 ), and closed- 
form characterizations on the number of iteration steps 0 . 
Hence, designing from SP 0 might enable us to obtain 
similar good properties. Moreover, the manipulations of the 
support identification in SP 0 provide us an easy interface 
to incorporate the prior support information (To, Sc)- The 
detailed algorithm designs are presented in the following. 

A. Algorithm Design 


C. Prior Support Information 

We consider the following prior support information of X 
is available. 

Definition 2 (Prior Support Information): The prior sup¬ 
port information regarding the information X is characterized 
by a tuple (To, Sc), where Sc < |To| < s, \ToOT\> Sc> 0. 

■ 

Remark 1 (Interpretation of Definition^: Note that To de¬ 
notes the prior signal support and parameter is a metric 
to indicate the quality of the prior support Tq. Specifically, 
a larger Sc means that a larger number of indices in To is 
correct and hence means a better quality of To- Compared with 
conventional works GT), |T5)-|T8) which exploit To blindly, 
we further consider some uncertainty information about the 
prior support To (quantified by sfi and such information allows 
us to exploit To adaptively based on Sc- Note that Sc refers to 
the number of correct indices but not the specific indices in 

To nr. 

We then summarize the challenge we face in the following 
and we propose a novel CS algorithm to handle the challenge 
in the next Section. 

Challenge 1: Recover the chunk-sparse matrix X from Y in 
0 exploiting the prior support information (To, Sc). _^ 


In 0, a subspace pursuit (SP) algorithm is proposed to 
solve conventional CS problems. The basic idea of the SP is 
to keep identifying the signal support based on the maximum 
correlation criterion 0 and by doing so, the SP algorithm 
achieves efficient CS recovery with robustness to measurement 
noises. In this section, we propose a modified subspace pursuit 
(M-SP) algorithm to solve Challenge 1 with exploitation 
of (To, Sc) and adaptation to the chunk sparsity model. To 
facilitate our presentations, we first define a set of notation 
rules as in Tablejl] The details of the proposed M-SP algorithm 
are presented in Algorithm [T] 

Remark 2 (Interpretation of Algorithm^: In the proposed 
M-SP algorithm (Algorithm 0, 7 is a threshold parameter, 
Ti and X(;) denote the estimated chunk support and the 
estimated signal for X in the (-th iteration, respectively. Note 
that when d = 1, L = 1 and Sc = 0, Algorithm 0 will 
degenerate to conventional SP 0 (except that the M-SP 
has a different stopping criterioiG] Table [h] illustrates the 
comparison between conventional SP and the proposed M- 
SP. The following explains how the proposed M-SP exploits 
the prior support information (To, Sc) and adapts to the chunk 
sparsity model; 

^Note that the more sophisticated stopping conditions in the M-SP al¬ 
gorithm (compared with that in conventional SP j^) enable us to obtain 
more complete convergence results. For instance, as illustrated in Table 0 
conventional SP only characterizes the number of convergence steps in 
noise free cases while our results cover both noise-free and noisy scenarios. 
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Algorithm 1 Modified-SP to Solve Challenge 1. 

Input: Y, $, s, {%,Sc), J,^d. 

Output: Estimated T and X. 

Step 1 (Initialization): Initialize the iteration index I = 0, 
chunk supportTI = 0, and the residue matrix Rp) = Y. 

Step 2 (Iteration): Repeat the following steps until stop. 

. A (Support Merge): Set 7)i = 7) U U 'T'c), where 


Tb = arg max 

ITiNscTiCTo 




[Til 


% = arg 


|T 2 | = s-s, 


max 

,T2C{l,..,K}\Tfa 


(4>"R, 


(i), 


Set = $ 


F 

IT2] 

t 

[Ta] 


B (LS Estimation): 

2^[{i,...,k}\t4 = 0 . 

C (Support Refinement): Select 7;+i as follows: 


F 

Y 


( 6 ) 


(V) 


and 


7i+i = 

Uiarg 


arg max 

ITiNscTiCTo I 


dTi] 


\T2\ = i 


max 


,K}\Ti 


^ ll 

D (Signal Estimation): Set = 


rlT2]\ 




X 


7+1) 


[^4 


( 8 ) 


lY and 


= 0 . 


E (Residue): Compute R(i+i) = Y — $r.^^^jX 
F (Stopping Condition and Output): If ||Rp_|_]^) 


[7i+i] 
(^ + 1 ) 


Stop and output T = Ti 




and X = Xp+i); Else if 


R 


(i+i) 


If > 


R(i)||^, stop and output T = Ti and 


X = X(;); Else, set ^ = ( + 1 and go to Step 2 A. 


Sc = 0 

Sc > 0 


No prior supp. info., 
conventional sparse signal 

prior supp. info., 
conventional sparse signal 

t-H t—I 

II II 

(Covered by SP & M-SP) 

(Covered by M-SP) 

No prior supp. info., 
jointly sparse signal 

prior supp. info.. 


jointly sparse signal 

d> lor 

(Covered by M-SP) 

(Covered by M-SP) 

L > 1 


Comparison (Sc = 0, d = 1, .L = 1) 


M-SP 

no noise 
y = -hx 

Pertormance 

i.e., X = X 

covered 

covered 

Convergence 

# iterations rico 

noisy 

y = + n 

Pertormance, i.e., 

||x-x||<0(||n||) 

covered 

Convergence 

# iterations Uco 

not covered 


Table II 

Comparison of the proposed M-SP and SP 0, j 26 |, (23. 


we identify a new chunk based on the Erobenius norm 
of (<I>^Rp))which corresponds to an aggregate 
correlation effect due to the A:-th chunk. This design 
adapts to the joint sparsity structure in X and may achieve 
better recovery performance |T9)-@. 

After giving the details of the proposed M-SP algorithm 
above, it is also important for us to characterize the associated 
recovery performance. Specihcally, we are interested in char¬ 
acterizing the distortion bound of the estimated signal as well 
as the convergence speed of the proposed M-SP algorithm. We 
shall discuss these issues in the next Section. 


. Exploitation of Prior Support Information {To,Sc) ■ 

Note that the information (To, Sc) is utilized in Step 2 A 
and C of Algorithm [T] As can be seen in Step 2 A, the 
newly added support (i.e., % IJ %) contains two parts, 
namely % and %, where % with size Sc is selected 
from prior support To, Tc with size s — is selected 
from {1,.., K}\Tb. This design utilizes the fact that prior 
support To contains at least correct indices. Similarly, 
in Step 2 C, the refined signal support Ti+i contains 
two parts, i.e., Sc indices from To and another s — Sc 
from the others. This is in contrast to conventional SP 
0 in which the newly added/updated signal support are 
blindly selected over the entire signal index set {1, ..,K}. 
Using the proposed support identihcation criterion, the 
prior support information To is utilized adaptively based 
on the quality information Sc, and hence better recovery 
performance may be achieved. 

• Adaptation to the General Sparsity Model: Note that 
we have considered a general sparsity model in which 
the signal matrix X has simultaneous zero or non-zero 
entries within each chunk (with size d x L). Therefore, 
instead of identifying each single element in X separately 
(as in the conventional SP 0), we identify each non¬ 
zero chunk as a atomic unit based on the aggregate 
correlation effects between the measurement matrix d) 
and the residue matrix R(;). For instance, in (j^-Q, 


IV. Performance Analysis of the Proposed M-SP 


In this Section, we analyze the performance of the proposed 
M-SP algorithm by deploying the tools of restricted isometry 
property (RIP) Specifically, we are interested in 


X-X 


) and the 


both the estimation distortion (e.g., 

convergence speed of Algorithm [1] Based on the results, we 
further derive some simple insights regarding how the prior 
support quality Sc affects the recovery performance. 

(Challenge 2: Analyze the distortion of the estimated signal X 
(and the associated convergence speed for Algorithm 


A. Preliminaries 

In the literature, the RIP Q is commonly adopted to fa¬ 
cilitate the performance of CS recovery algorithms. However, 
the conventional RIP 0 only serves to handle general sparse 
signal vectors without sparsity structures. To deal with the 
CS problems with block sparsity structures, the authors in 
0 further propose the notion of block-RIP by extending 
the conventional RIP 0. This block RIP | |20l can also be 
deployed to facilitate the performance analysis in our scenario. 
We hrst review the notion of the block-RIP 1^ as follows: 

Definition 3 (Block Restricted Isometry Property 
Matrix $ € satishes the fc-th order block-RIP with 
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block size d {d\ N, K = ^) and block-RIP constant Skid, if 
0 < Sk\d < 1 and 


Sk\d =min 


{<5: (1-5) 


xlla < ll^x ||2 


< (l+< 5 )||x|| 2 , |supp^(x)| < fcj 


where supp^(x) = {n : ||x[n]|| > 0,1 < n < K} with x[n] 


denoting the n-th block of x (with block size d x 1 ) | 20 |. 


Note that when c? = 1, the block-RIP will be reduced to the 
conventional RIP 0 . In the following analysis, we assume 
that the measurement matrix $ has block-RIP properties with 
5k^d denoting the /c-th order block-RIP constant of $. We 
first introduce the following inequalities over the block-RIP 
by extending conventional results 0 , 0 , 0 . 

Lemma 1 (Inequalities over the block-RIP): The following 
inequalities are satisfied; 

1) If ki < k 2 , then 5^^,^ < 

2) For support T with \T\ < k, we have 

1 — < CTmin (9) 

< CTmax < 1 + Sk\d, 




< 


\/l — ^k\d 


( 10 ) 


3) For two disjoint supports 7i, T 2 , where |7i| < fci, | 72 | < 
^ 2 , 7i n we have 


CTinax ^ '^fci-|-fc 2 |d- (H) 

4) Suppose the chunk support of X is 71. SupposeTi, 
T 2 are two disjoint supports where |7i| < fci, ir 2 | < k 2 , 
7i P |72 = 0- Denote the projection matrix P(T 2 ) as P(r 2 ) = 

‘J’ra ''5’ff2]-Then, 

||P(r2)‘l’X||i? < hi-vk 2 \d\J^ + hi+k 2 \d l|X||^ . 

Proof: See Appendix [A| ■ 


B. Performance Analysis of the Proposed MSP 

Using the properties in Lemma 0 we obtain the following 
property regarding the residue matrix R(i+i) and estimated 
signal X(;_|_i) in the Z-th iteration of Algorithm 1 

Lemma 2 (Iteration Property in Algorithm^ypn the Z-th 
iteration (Z > 1) in Step 2 of Algorithm [R the following 
inequalities are satisfied; 

IlffC^ + l) IIf — ll^(b ll_F 


x-x, 


(Z+1)||^ < (Cl) 


1+1 


1 -f 5s 


1-5 


31 \d 


l|X|i^ + CsjZjr; (13) 


where 77 = ||N||^ is the noise magnitude, Ci, C 2 and C^{1) 


are expressed in Table III 


Proof: See Appendix B] ■ 

Note that equations (|12|l-(|l3|l in Lemma 0 are very impor¬ 
tant to derive the distortion bound and the convergence speed 
in Theorem 0 and 0 respectively. For instance, if Ci < I, 


then the distortion (i.e., ||X — X(;) | _ in (13 1 ) turns to decrease 


Cl 




C2 

,, ( 2 «,2|d 1 2 v'l+'Ss|d\ , 

+3(0 


C4 

(1-C1-I-C2) 

Id 

where 

51 = 2s-y min (0, To — 2sc) 

5 2 A 3s -1- min (0, To — 3sc) 


Table III 

The detailed expressions for constants. 


exponentially with ratio Ci in the iterations of the proposed 
M-SR Based on Lemma 0 we obtain the following recovery 
distortion bound for the proposed M-SP algorithm. 

Theorem 1 (Recovery Distortion Bound): Suppose the S 2 - 
th block RIP constant Sg^id satisfies Sg^^d < 0.246. The 
following properties are true regarding Algorithm 0 
(i) The final obtained solution X satisfies 


X-X 


< max 
F 



7 + 7 \ 


(14) 


(ii) If the signal X satisfies minj^gT-l|X[fc]||^ > 

max ( Cati, ), then final obtained solution X 

V v^"Liidy 

further satisfies 


X-X 


1 

< — , 77 

F \/l - 5 j|d 


(15) 


where 7 is the threshold parameter in Algorithm 0 si, S 2 and 
C 4 are given in Table 

Proof: See Appendix]^ ■ 

Remark 3 (Interpretation of Theorem^: Note that 5 ^ 2 |d < 
0.246 is to ensure Ci < 1 in ( [T3 ] l When there is no 
noise in the system, i.e., p = ||N||p. = 0 and 7 is set to 
be 7 = 0 , then perfect signal recovery, i.e., X = X, will 
be achieved from Gl- Based on Theorem 0 we have the 
following discussion regarding the proposed M-SP algorithm; 
. Backward Compatibility with Conventional SP 0, 
127] ; Note that when d = 1, L = 1 and Sc = 0, the 
proposed M-SP will be reduced to the conventional SP 
0 (except that we have more sophisticated stopping 
conditions as explained in footnote 0. In such a sce¬ 
nario, the requirement on the RIP constant in Theorem 0 
becomes 535 < 0.246, which is slightly better (a slightly 
weaker requirement) than the so-far best known bound 
( 53 j < 0.2412) derived for SP in Thm. 3.8 of ||27| . 
This is because we have combined the techniques in these 
pioneering works 0, 0, 113 , to derive Lemma 0 and 
Theorem 0 (please refer to Appendix [B| for the details). 
A detailed comparison between the proposed M-SP and 
conventional SP 0 is given in Table 0 
• How Prior Support Quality Sc Affects Performance; 
From Theorem 0 a larger Sc (a higher quality of the prior 
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support To) would achieve a better CS recovery perfor¬ 
mance. For instance, suppose $ is a i.i.d. sub-Gaussian 
randorr0 matrix from the number of mea¬ 

surements M to achieve the S 2 -th order block-RJP with 
5 s2 = 5, is given by M = 0 {s2dh\5~^ -f i5“^S2 logTf). 
On the other hand, S 2 — 3s -f min (0, |7o| — 3sc) is 
monotonically decreasing as Sc increases when < 
Sc < I To I- Therefore, a larger Sc would lead to a 
weaker requirement on the number of measurements M 
to achieve the desired performance in Theorem [T] 

We further have the following result regarding the conver¬ 
gence speed of the proposed M-SP algorithm (Algorithm [^l- 

Theorem 2 (Convergence Speed): Denote p as the total sig- 

, ,2 

nal energy, i.e., p = ||X||^. Suppose p > ( 

7 > i^-Ci • ^S 2 \d < 0.246, then Step 2 of Algorithm will 

stop with no more than Uco iterations where rico is given by 

_C2ri_ 

—7 . __ nfi'i 

Proof: See Appendix [G| ■ 

Remark 4 (Interpretation of Theorem^: Theorem gives 
an upper bound on the number of iterations in the proposed 
M-SP. Compared with the conventional SP Q, our derived 
convergence result further cover the cases with measurement 
noise, which is not discussed by conventional SP Q (see 
Table |I^ for the detailed comparison). On the other hand, 
from Theorem we obtain that Algorithm [T] will converge 
in O (log SNR) steps in the high SNR (i.e., SNR = ^ —>■ c») 
regimes. 



V. Robustness to Model Mismatch 

In Section III, we have proposed an M-SP algorithm to 
exploit the prior support To adaptively based on the support 
quality information Sc- However, in practice, there may be 
cases with incorrect statistical information Sc, i.e., |To H "^1 < 
Sc- In such scenarios, the proposed M-SP may perform badly. 
We use the example below to illustrate this fact. 

Example 1 (Algorithm^with Model Mismatch): Consider 
quality information Sc wrongly indicates the quality of the 
prior support To, i.e., ITon"^! < Sc, and |T| = s (i.e., in T, 
only less than Sc indices are from To). With the proposed 
M-SP algorithm, from Step 2C, there will always be Sc 
indices selected from To while Algorithm will select no 
more than s — Sc indices from {1,.., iT}\To- Consequently, 
the hnal identihed signal support T will always be incorrect. 

From the above example, the performance of the proposed 
M-SP is sensitive to model mismatch with incorrect Sc- In 
this section, we shall further propose a conservative M-SP 
approach which will be robust to scenarios with possible 
model mismatch. 

Challenge 3: Robust algorithm design to combat model mis¬ 
match with incorrect prior support information Sc- 


^Note that the randomized approach is a commonly adopted method to 
generate the CS measurement matrix for a good RIP property |20|. 


Cs 

1 d 1 d 1 ^+4-5^3 1 rf 1 d 

(l-^.,|d)^ 

Ce 

o / 1 /TTTTTi/l 1 

( ^^3S+s^\d 1 2.^1+«s|d\ , 


C7 

(l-C’s+C’e) 

Cl~C'5)\/l — '^25|d 

where 

S3 = 3s -1- Sc -f min (0, To - To fl T* * “ sf) 


Table IV 

The detailed expressions for constants. 


Algorithm 2 Conservative M-SP to Solve Challenge 3. 
Obtained from Algorithm with Step 2A, and Step 2C 
replaced by the following substeps, respectively: 


• Step 2A (Support Merge): Set s 
merge Ta = fi{}Tb{jTc, where 



and 


% 


fargmax|7-3|=j^-7-icro 


l0 



Tc = arg max 
\T 2 \ = -S 




s > 0 

s < 0 ’ 
(IT) 

(18) 


• Step 2C (Support Refinement): Select T/+i = 
arg max| 7 -|^s 


A. Proposed Conservative M-SP Algorithm 

The conservative M-SP algorithm is obtained by redesigning 
two substeps in Step 2 of Algorithm [T] The details are given 
in Algorithm]^ 

Remark 5 (Interpretation of Algorithm^: Note that in 
Step 2A of the conservative M-SP, the newly added support 
contains two parts, Tj, and TO, where TO is selected from To 


with size Sr — 




(compared with Sc in M-SP), and TO 
is selected from the entire index space Tf} with size 

s (compared with size s — Sc selected from iT}\To in 
M-SP). These designs give us opportunities to further search 
for support outside To when the information of Sc is incorrect 
(i.e., \ToC\T\ < Sc). On the other hand, in Step 2C of the 
conservative M-SP, the updated support T/+i with size s is 
selected from the entire index set Tf} (compared with 
the two part structure in M-SP). Using this design, even if Sc 
wrongly indicates the quality of To, we still have chances to 
correctly identify the signal support. Note that the proposed 
conservative M-SP still exploits the prior support information 
but in a conservative way: 

• Exploitation of Prior Support (To, Sc): For instance, 
in step 2 A, equation ( [T7| l ensures the selected support 
candidate Ta contains at least Sc indices from To. 

. Conservativeness in Exploiting (To, Sc): Compared 
with the original M-SP, the proposed Algorithm[^exploits 
(To, Sc) in a much more conservative way. First, in Ta 
obtained in Step 2, although To has already contributed 
Sc indices, another s indices are further selected from 
the entire index space Tf} in (18 1 . Second, the 
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refined support Ti+i is obtained from the entire index 
space {I, ..,K} based the maximum correlation criterion 
as in Step 2C (instead of always selecting Sc indices 
from To as in the original M-SP). These designs allow 
opportunities to search for the signal support outside To- 
As a result, the proposed conservative M-SP does not 
utilize (To, Sc) wholeheartedly and hence, is exploiting 
(To, Sc) in a more conservatively way (compared with 
the M-SP). 

Recall Example with model mismatch (i.e., |To H ^1 < 
Sc)- Using the conservative M-SP, both Step 2A and Step 2C 
would select s indices from the entire index space {l,..,Ar} 
based on the maximum correlation criterion ©. Therefore, the 
conservative M-SP has a chance to identify more than s — Sc 
indices from { 1 ,Ar}\To and it is still likely that the correct 
support T can be identified. Hence, the conservative M-SP 
is robust to model mismatch with incorrect Sc- We formally 
discuss this fact in the next Section. 


B. Performance Analysis of Conservative M-SP 

In this Section, we shall analyze the recovery performance 
of the proposed conservative M-SP. Specifically, we give 
similar results as in Section IV except that the the derived 
results in this section do not require the assumption that the 
information Sc is correct. 

Theorem 3 (Distortion Bound of Conservative M-SP): 
Suppose the sa-th order block-RIP constant (Jsgid satisfies 
Ss^\d < 0.246. We obtain the following results regarding 
Algorithm 

(i) The obtained solution X satisfies 


X-X 


< max 
F 



7 + T 


\/l — ^2s\d 


(19) 


(ii) If 

max I Crry, 


X 

i+v 




satisfies minfegr llX[fc]|j^ 
, then X further satisfies 


X-X 


< 


1 


\/l — ^s|d 


> 


( 20 ) 


where S 3 , C^, Cg, C 7 depends on the block-RIP constants and 
are given in Table [TVl 

Proof: See Appendix [H| ■ 

Theorem 4 (Convergence Speed of Conservative M-SP): 
Denote p as the signal energy, i.e, p = ||X||^. Suppose 

P > and 7 > If 5 ^ 3 !^ < 0.246, then 

in Algorithm Step 2 will stop with no more than Uco 
iterations where Uco is given by 


Sc (i.e., no matter whether |To H ^1 ^ is true or not). 
These results demonstrate the robustness of the proposed 
conservative M-SP towards model mismatch with incorrect Sc- 
Note that compared with the M-SP, there is an increase on the 
requirement of the block-RIP conditions as can be seen from 
the expression of S 3 in in Table IIVI (i.e., S 3 > S 2 ). This 
is due to the conservative exploitation of (T, Sc) in Algorithm 

such that in Step A, a larger support candidate is involved 
in the signal support identification. 

VI. Application to Sparse Channel Estimation in 
Massive MIMO 

In this section, we shall apply the proposed framework of 
CS to the channel estimation problem in massive MIMO | |28t 
with temporal correlation. One key challenge to implement 
massive MIMO is to efficiently obtain the channel state infor¬ 
mation at the transmitter (CSIT). Recently, it has been shown 
that the massive MIMO channel is sparse due to the limited 
local scatterers effect p9) , pO) and hence CS techniques are 
deployed to reduce the CSI acquisition overhead by exploiting 
the channel sparsity. Eor instance, in |3T), CS techniques are 
deployed to improve the channel feedback efficiency and in 
p2| , a distributed CS framework is proposed to enhance both 
the channel estimation and feedback performance in downlink 
massive MIMO systems. Besides, works pT| and p4) further 
consider uplink massive MIMO systems, and a CS-based low- 
rank approximation scheme and a sparse Bayesian-learning 
algorithm respectively, are proposed to improve the channel 
recovery performance. However, these existing approaches 
|29l , pO) only consider a one-time slot static scenario. In 
massive MIMO systems with temporarily correlated multi¬ 
paths (as illustrated in Eigure [^, it is desirable to exploit the 
channel temporal correlation to further reduce the required 
pilot overhead. In this section, we share achieve this goal by 
applying the proposed framework of CS recovery with prior 
support information. 


A. System Model 

Consider a flat block-fading EDD massive MIMO system 
with one BS and one UE, where the BS and UE have M 
(M is large) and N antennas respectively. To estimate the 
downlink channel from the BS to the UE, the BS sends a 
sequence of T training pilot symbols on its M antennas. 
Denote the transmitted pilot training matrix as 0 € 
where tr(00^) = T. The corresponding received signal at 
the UE Z e is 


Uco = logcs 


-Cs—n 

1-C5 ' 


V^l + T - 


( 21 ) 


Proof: (Sketch) The proof is similar to Appendix and 
is therefore omitted to avoid duplication. ■ 

Remark 6 (Interpretation of Theorem m- Different from 
the theoretical results derived for M-SP in Section IV, Theorem 
00 for the proposed conservative M-SP (Algorithm do 
not depend on the assumption of correct quality information 


Z = /PH0 -b W (22) 

where P denotes the transmitted SNR from the BS, H G 
(pNxM jjjg quasi-static channel from the BS to the UE, 
W £ is the channel noise whose elements are i.i.d. 

complex Gaussian variables with zero mean and unit variance. 
Our target is to estimate the channel matrix H based on the 
obtained channel observations Z at the UE. We first elaborate 
the considered channel model in the next subsection. 

























Channel Maganitude in Angular Domain 



Figure 2. Illustration of angular domain channel with the ITU-R IMT- 
Advanced model in Urban Micro scenario |36| . The number of antennas at 
the BS and UE are 70 and 2, respectively. As can been seen, (i) the angular 
domain channel are sparse; (ii) the angular domain channel on different receive 
directions has simultaneous channel support. 



Figure 3. Illustration of point-to-point massive MIMO system in which the 
previous and current frames share some common spatial channel paths 7b fl ^ 
due to the slowly varying scattering environment. As such, the estimated 
channel support 7b in the previous frame can be utilized to enhance the 
CSIT estimation performance in the current frame. 


frame and Sc characterizes the size of common channel paths 
between To and T, i.e., iToPl’ri > Sc- 


B. Channel Model 

Consider a uniform linear array (ULA) model for the 
antennas installed at the BS and UE. The channel matrix H 
can be represented p5| as 

H = UHaV^ 

where U S £Nxn y g £Mxm (jgnote the unitary 
matrices for the angular domain transformation at the UE and 
BS side respectively, Hq g (J^a^xm angular domain 

channel matrix. In massive MIMO systems, due to the limited 
local scattering at the BS side, the angular domain channel Ha 
turns out to be sparse. Eurthermore, as the UE has a relatively 
rich number of local scatterers compared with its number of 
antennas, the angular domain Ha has simultaneous zero or 
non-zero columns, as indicated in (illustrated in 

Eigure|^. Eigure illustrates the simulated results of the an¬ 
gular domain channel using the ITU-R IMT-Advanced channel 
model 13^ . Based on these features and similar to d), d), 
we consider the following channel model for our point-to-point 
massive MIMO system. Denote supp(h) = {i : h(i) ^ 0}. 

Definition 4 (Massive MIMO Channel Model): Let h_, g 
be the j-th row vector of Ha G The channel 

matrix Ha satisfies: supp(hi) = • • • supp(hAr) = T, where T 
is the channel support and \T\ < s. Eurthermore, the elements 
in (Ha )j- are i.i.d. complex Gaussian variables with zero mean 
and unit variance. 

Note that s is a statistical upper bound on the number 
of spatial paths from the BS to the UE. In practice, the 
channel sparsity levels depend on the large scale properties 
of the scattering environment and changes slowly and hence 
information like s can be obtained at the UE from prior 
offline measurements. On the other hand, the channel paths 
are temporarily correlated so that consecutive frames would 
share some common channel paths. As a result, we can 
utilize the prior support information (Definition in massive 
MIMO scenarios. Specifically, in the prior support information 
(TojSc), To is the estimated channel support in the previous 


Remark 7 (Practical Considerations): In practice, we usu¬ 
ally need to estimate a sequence of channels H[i], Hpj ... 
where Hj^] is the channel from the BS to the UE in the i- 
th frame |36|. At the very beginning, we don’t have prior 
channel estimations and hence we can set the prior channel 
support information (To,Sc) to be To = 0, Sc = 0. At later 
stages when we have already obtained some prior channel 
estimations, the estimated channel support in the previous 
frame (e.g., H[j_i]) can act as the prior support To for the 
present time (e.g., H[j]). On the other hand, due to the slowly 
varying propagation environment between the BS and UE O’ 
(as illustrated in Eigure [^, it is likely that the size of the 
common support between consecutive channels, i.e., H[j_i], 
H[j], changes slowly so that we can gradually obtain a reliable 
statistical information as Sc- Eor instance, we can select Sc 
to satisfy Pr(|Ti-in"^l ^ ^c) > 1 — e for some small e, 
0 < e < 1 from prior channel measurements based on long 
term stochastic learning and estimation ez)- Note that a larger 
Sc indicates a stronger temporal correlation between channels 
of consecutive frames. ■ 


C. Channel Recovery with the Proposed CS Framework 


In this subsection, we talk about how to apply the proposed 
CS framework to conduct the recovery of H based on Y. 

'tihallenge 4: Apply the proposed framework of CS with prior' 

support information in Section II, to conduct the recovery of 
H from 


Eirst, equation (22i can be re-written as 


(Z"U) = 


(V0) 


H 


PT 

It ' 


Ha) 


H 


W^. (23) 

N 


Then (|23]l matches the CS measurement model in ([3]), where 


(Z-^U) are measurements (role of Y in (3l), C y(V0) 
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the measurem enj^ matrix ($ in ([^), W^U is the noise (N 


in (3 


I) and 



is the unknown signal source (X in 

H 


(3i). Furthermore, ^ satishes the general sparsity 

model in Section II-B with chunk size lxN{d=l, L = N 
as in Dehnition [T]). As such, the channel recovery problem is 
transformed the CS problem we consider in Section 11. 

Second, based on the transformed CS equation ( |2^ , we 
apply the proposed M-SP (Algorithm[T]) to conduct the channel 
recovery by replacing the input parameter Y with (Z^U), <i> 
with (V0)^, with d set to be d = 1. Denote the obtained 
algorithm output as X. Then the recovered channel H for H 
is given by 


H= W—U(X) 
\ PT ^ ’ 


H-yH_ 


(24) 


Third, we deploy the analytical results in Section IV to 
derive some performance results for H. Note that when d = 1, 
the block-RIP is reduced to the conventional RIP Q. Suppose 
that the pilot matrix satishes the RIP property and 

denote the corresponding fc-th order RIP constants as 5^ (note 
that 5k = 5k\i as d = 1). Based on Theoremand from the 
unitary invariance property of Frobenius norm, we obtain the 
following distortion bound. 

Theorem 5 (Channel Recovery Performance): If the S 2 -th 
order RIP constant of <1> = satishes < 0.246, 

where S 2 = 3s + min (0, |7o — 3sc), then the average channel 


recovery distortion, i.e., E ^ 

(II*-iJ 


H-H 


satishes 



C 4 + 




r(VT+|) 

V{NT) 


+ 


V^l - 


(25) 


where 7 is threshold parameter in Algorithm r( ) is the 
gamma function, Cr is a constant given in Table [HI] 


Proof: From Theorem equation (24i 


5 s2 ^ 0.246 and 


Si < S 2 , we derive 


H-H 



From this and E (p) = E (||W||^) = \ equation (25 

is derived. I 

From Theorem [^ as the transmit SNR P —>■ 00, the 
average recovery distortion E^H — H perfect 

channel recovery will be achieved. On the other hand, from 
the expression of S2 = 3s + min (0, |7o| — 3 Sc), S2 decreases 
as Sc increases when Sc > 5 |7o|- In other words, a weaker 

RIP condition on the measurement matrix is required 

as Sc increases (e.g., we need ^35 < 0.246 for Sc = 0 and 
5s If 0.246 for Sc = |7o| = s). This leads to a smaller 
requirement on the number of training pilot T Q. From this, 
we conclude that a larger strength of temporal correlation on 
the channel support (i.e., larger Sc) can enjoy a better reduction 
on the number of training pilots in massive MIMO systems. On 

®Note that the term to normalize the measurement matrix $ = 

to satisfy tr(<I>$^) = M so as to fit into the analytical 
framework of block-RIP property in Definition [^ 


the Other hand, if we apply the conservative M-SP (Algorithm 
[^ instead of M-SP (Algorithm [TJ to conduct the channel 
recovery, we can obtained a similar recovery performance 
result as in Theorem [^ by deploying Theorem (details are 
omitted to avoid duplication). 


D. Discussion on the Pilot Matrix 0 


Note that we have not discussed the design of the pilot 
matrix 0 so that the aggregate measurement matrix = 
in (231 can satisfy the RIP condition in Theorem 5 
In the CS literature, matrices randomly generated from sum 
Gaussian distribution Q can satisfy the RIP with overwhelm¬ 
ing probability and this randomized generation method has 
also been widely used. Following this convention, the elements 
of the pilot matrix 0 G can be generated from i.i.d. 

sub-Gaussian distribution (e.g., with equal 

probability). Using this method, from Q, when the length T 
of the training pilot satishes T > ciklogM, the probability 
that the CS measurement matrix = W ^0^ in (231 satishes 


a prescribed fc-th order RIP condition 5k = 5 will be no less 
than 1 — 0 (exp(—C 2 (r)), where ci and C 2 are some positive 
constants depending on 5 Q. 


E. Discussion of Other Possible Applications 

In fact, the proposed framework can potentially be ap¬ 
plied to many other areas, including wireless sensor networks 

in 


(WSN) 1381 and magnetic resonance imaging (MRI) 
which the target sparse signals usually demonstrate strong 
temporal correlations. To apply the proposed scheme, one can 
learn the statistical information Sc (which characterizes the 
size of the shared common support between two consecutive 
signals) using the tools of stochastic learning and estimation 
pT] . In this work, we have proposed two algorithms, namely 
the M-SP and the conservative M-SP to conduct the signal 
recovery. For a specihc application scenario, if the uncertainty 
on Sc is smalQ then one should use the M-SP algorithm 
for better performance. On the other hand, if the underlying 
uncertainty on model parameter of Sc is large, then one would 
prefer conservative M-SP for robustness. The robustness of 
conservative M-SP with respect to model mismatch on Sc is 
illustrated in Figure 7 (will be elaborated in Section VII.D). 


VII. Numerical Results 

In this Section, we consider the scenario of sparse channel 
estimation in massive MIMO systems as in Section VI to 
verify the effectiveness of the proposed framework. Specih- 
cally, we compare the performance of the proposed M-SP and 
conservative M-SP with the following baselines: 

• Baseline 1 (SP): Deploy conventional SP ||^ to recover 
the massive MIMO channel. 

• Baseline 2 {Basis Pursuit): Deploy conventional basis 
pursuit to recover the channel. 

^e.g., when the size of the common support between consecutive signals 
changes slowly, one can learn |37| a reliable statistic information Sc such that 
Pr(|7o n T| > Sc) > 1 — e for some small e, 0 < e < 1. 
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• Baseline 3 {modified Basis Pursuit): Deploy the modified 
basis pursuit proposed in m to recover the channel with 
blind exploitation of the prior support information. 

• Baseline 4 (MMV-SP): Deploy an improved version of 
the SP Q (corresponds the proposed M-SP with Sc = 0) 
to adapt to the general sparsity model but without ex¬ 
ploitation of the prior support information. 

• Baseline 5 (AMP-MMV): Deploy the approximate mes¬ 
sage passing for multiple measurement vector problems 
(AMP-MMV) to conduct the channel recovery 0. 

• Baseline 6 (Genie-aided LS); This serves as a per¬ 
formance upper bound scenario, in which the channel 
support T is assumed to be known and we directly use 
least square to recover the channel coefficients on T. 

We consider a narrow band (flat fading) point-to-point 
massive MIMO system with one BS and one UE, where 
the BS and UE have M = 200 and N = 2 antennas, 
respectively. Denote the average transmit SNR at the BS as 
P. We use the 3GPP spatial channel model (SCM) to 
generate the channel coefficients and we consider that the UE 
has a rich local scattering environment as in | |40| . Denote the 
channel to be estimated in the i-th frame as and denote its 
corresponding channel support as Ti- Suppose that the number 
of spatial paths from the BS broadside (corresponding to |7i|) 
are randomly generated as \Ti\ ^ U {s — 2,s), Vi, where 
U{a,b) denotes discrete uniform distribution over the set of 
integers {a, a -f 1,..., 6}. Consider a slowly varying scattering 
scenario so that consecutive frames (i.e., Ti, T+i) share some 
spatial channel paths with size \Ti{^Ti+i\ ~ U (sc,Sc -f 2). 
The threshold parameter 7 in the proposed M-SP and con¬ 
servative M-SP are given by 7 = s/2NT, where T is the 
length of the training pilots. In baseline 2 Q and baseline 3 
lig, the threshold parameters in the constraint of the ^i-norm 
minimization are also set to be s/2NT. In the following, we 
compare the normalized mean squared error (NMSEn |j^ of 
the estimated channel with G = 1000 channel realizations. 

A. Channel Estimation Performance Versus Overhead T 

In Eigureffl we compare the normalized mean squared error 
(NMSE) pi] of the estimated channel versus the length of 
the training pilot T, under transmit SNR P = 25 dB, channel 
sparsity parameter s = 18, and prior channel quality parameter 
Sc = 10. Prom this figure, we observe that the channel 
estimation performance increases as T increases, and the 
proposed M-SP algorithm achieves a substantial performance 
gain over the Baseline 1-4. This is because the proposed M- 
SP adaptively exploits the prior channel support based on its 
quality parameter Sc and it also adapts to the joint channel 
sparsity structure as illustrated in Section VI. Specifically, the 
performance gain of the M-SP over MMV-SP demonstrates the 
advantage of adaptively exploiting the prior channel support, 
and the performance gain of MMV-SP over SP indicates the 
benefits of adapting to the joint sparsity structures. On the 

*The NMSE of the estimated channel is computed as 
||h-h|P 

^ ||h .|7 where H, and H; are the actual channel and 

^ II II fr 

the estimated channel, in the 2 -th realization respectively, G is the number 
of simulation realizations. 


NMSE of Estimated Channel Versus T 



Figure 4. NMSE of estimated channel versus the pilot training length T 
under s = 18, Sc = 10 and transmit SNR P = 25 dB. 



Figure 5. NMSE of estimated channel versus transmit SNR under T = 52 
and s = 18, Sq = 10. 

other hand, note that the proposed conservative MSP has a 
smaller performance gain compared with the proposed M- 
SP. This is because the conservative M-SP utilizes the prior 
channel support in a more conservative manner and hence 
achieves less exploitation gain. 

B. Channel Estimation Performance Versus Transmit SNR P 
In Eigure]^ we compare the NMSE of the estimated channel 

versus the transmit SNR P under T = 52, s = 18 and 
Sc = 10. Erom this figure, we observe that the proposed M-SP 
algorithm has substantial performance gain over the baselines 
and relatively a larger performance gain is achieved in higher 
SNR regions. 

C. Channel Estimation Performance Versus Temporal Corre¬ 
lation Strength Sc 

In Eigure we compare the NMSE of the estimated 
channel versus the prior support quality parameter Sc (which 
indicates the strength of temporal correlation between channels 
of consecutive frames) under T = 52, s = 18 and P = 25 
dB. Erom this figure, we observe that the channel estimation 
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NMSE of Estimated Channel Versus Sc 



Figure 6. NMSE of estimated channel versus prior channel support quality 
parameter Sc under transmit SNR P =25 dB, T = 52 and s = 18. 


NMSE of Estimated Channel Versus Sc 



Figure 7. NMSE of estimated channel versus the believed Sc under model 
mismatch with fixed \7if]7i+i\ = 9, Vi. The other parameters ai‘e given 
by: transmit SNR P =25 dB, T = 52 and s = 18. 


performance of the proposed M-SP and conservative M-SP 
gets better Sc increases. This is because a larger Sc means that 
a larger part of the prior channel support can be exploited. 
This simulation result also verifies the analysis in Section IV. 


D. Channel Estimation Performance under Model mismatch 

In this Section, we simulate the cases of model mismatch 
with incorrect information of Sc, i.e., iToH^I < ^c- Suppose 
that the size of shared channel support between consecutive 
frames is fixed to be |7i f]Ti+i\ = 9, Vi, while the believed 
quality parameter Sc varies from 8 to 13 (so that the believed 
prior support quality is incorrect when Sc G {10, ..,13}). 
Figure |7] illustrates the NMSE of the estimated channel versus 
believed quality parameter Sc under transmit SNR P = 25 
dB and s = 18. From these figures, we observe that the 
performance of the M-SP degrades severely and a larger 
performance degradation is observed with a larger Sc when 
Sc > 10 (i.e., a larger model mismatch). However, the 
conservative M-SP is stable and still enjoys performance gains 
over the baselines. These results demonstrate the robustness 
of the proposed conservative M-SP algorithm with model 
mismatches. 

VIII. Conclusions AND Future Works 

In this paper, we consider CS problems with a prior support 
and the associated quality information available. Modified 
subspace pursuit recovery algorithms are designed to adap¬ 
tively exploit the prior support information to enhance the 
signal recovery performance. By deploying the tools of block- 
RIP, we bound the recovery distortion and we show that the 
proposed algorithm converges with O (log SNR) iterations. To 
tolerate possible model mismatch, we have further proposed 
a conservative design to have more robustness in cases of 
incorrect prior support information. Finally, we apply the 
proposed framework to channel estimation in massive MIMO 
systems with temporal correlation, to further reduce the length 
of the channel training pilots. 


Appendix 

A. Proof of Lemma 

The first two items directly follow from Definition 
The following proves the third statement. First, we ob¬ 
tain CTmax ($f^^yr 2 ]®[riUr 2 ]-l) < 4i-rfc2|d from Def¬ 
inition Second, is a submatrix of matrix 

UUT 2 ] ~ I- From the property that the spectral 
norm of a submatrix is always upper bounded by the spectral 
norm of the entire matrix, the third item is proved. The fourth 
inequality in Femmadirectly extends Femma A.3 of p7|. 


B. Proof of Lemma 

We first introduce the following equalities property: 




(26) 

= +^[r,nr2iX'^^nr2i 

(27) 


0 

(28) 

(l-^[Tii^fni) ^[Ti] = 

0 



We first introduce the following inequalities property. Suppose 
(Tniin (A) and (Tinax (A) as the minimum and maximum sin¬ 
gular values of A S respectively, i.e., let 

A = UrSU^, E S be the singular decomposition of 

A , CTmin (A) = min (diag (A")), a^ax (A) = max (diag (E)), 
we have 

fTniax (AB) 

— ^max (A) fTmax (B). (29) 

(A) ||B||^ < ||AB||^ < a„,ax (A) ||B||^ . (30) 


Note that the above property (261-( 301 will be frequently used 
in the coming proof Based on the selection criterion of Ta, Ti, 
we obtain that (i) Ta = {Th[jTc)[jfi, (ii) \{%[}Tc)\ < s. 


Ti 


< s, \r\ < s; (iii) Each of the three index set, i.e.. 


(TbiJTc), Ti and T contain at least Sc elements from To; 
Therefore, 
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|7a| < Si = 2s + min (0, |7o| - 2sc). (31) 

|7”U'7)| < Si = 2s + min (0, |7o| - 2sc). (32) 

I'T^JTIiI < S 2 = 3s + min (0, |7b| — 3sc). (33) 

Based on ([31}-®’ we obtain the following Lemma. 

Lemma 3 (Iteration Property): In the Lth iteration of Al¬ 
gorithm the following three equations will be satisfied; 


D. Proof of equation ( |35p 

From the selection criterion of Step 2.C in Algorithm [1] we 


obtain 


ZlTi+i 


> ||Z[’^I||, which leads to 


> 


Denote P(r„) — 


ZirAT] 

[To] (^ir„]^[r.i 


ZI'^AA+i] 
-1 


(39) 


We further ob¬ 


tain 


Z‘"'“’ = ‘J’fr„]Y = ‘i>fr„]P(r„)Y 


R-i, 


< ^ /1 -f 5, 


s|d 


XinTi+i] 


-f q. (34) =^[ro]P(ro) (®[ra]X[r„] + ^[r\ra]X[r\r„] + N) 


lx[7'\'F+i] 



^ \ 


1 -f 


(l - Ssi\d) 


X[nr„] 


\/r^ 


si|d 


--V- 


x[nr„i 




2S 


S2\d 


Ri 


+ 


25 


S 2 \d 


2-^1 + 5, 


s|d 


(l l^sld) -^/l ^ 


|R(oL<(Ci)' +- 


C 2 n 

-Cl' 


From ||R(o) IIf< <5s|(i ||X||^ + rj and the fact that 
j|R(o||^ = ||$(X-X(o) + N|| 




(' 




N 


CTmax (I - i) ^ 1- F™m (|38|), Using properties 


Axl-C] + -Efr.\ + 


(40) 


where E G given by E[{i_ = 0, 

P(T„)4’[r\Ta]^[T\Tal — 4>[r„]E[ra]- Front equation (40 1 , we 
obtain 


(35) 


(36) 


ZirAT] 

Z['L\'F+i] 

We further obtain 


< 


eIl.] 


1 


F V'l-<5si|d ' 


p. (41) 


> 


X['7'\'F+i] 
E[r. 


- IIXIA'C] 
1 


Proof: The detailed proof for equations (34 1 -( 36 1 are 
given in Appendix [^p| respectively. ■ 

Combine equations ([T^-([3^, equation (12i in Lemma is 
derived. Next, we prove equation ( [T3] l in Lemma Based on 
(fT 2 |i, we obtain 


l-<5, 


Slid 


eIt.] 


< 




$[7;]E['F‘1 

(a) 


(42) 


P(A.)$[r\r„]X™||^ < ^S2\d^l + ^d 


Xinr.) 


F 

(43) 


where (a) comes from the fourth property in Lemma 
Equation ([43| leads to 


(37) 


eIt;: 


< 


'^S2ld\/l + <^S2|d 


Xir\r. 


(44) 


equation ( [T3] l is derived. 

C. Proof of equation 0 

From the expression of R(;+i) in Step 2E of Algorithm 
we obtain 




F yi - ^si|d 

Combining equation 0 , 0 , 0 and ( |44) i, we obtain 
equation ( (T5] l. 

E. Proof of equation 0 

At the beginning of the Z-th iteration, the residue matrix 
R(/) can be expressed as 

R(/) = fl- $r-K,T> 


(*1^1x1 ^+n) 


^[r\7)i ^[fi] 


C5) 


where X = 


X[^\'F 1 

Xa 


and Xa = X^l - 

From the properties in Lemma [T] equation (45 i and equation 
we obtain 


(38) 


Note that I — ^ is a projection matrix hence 

in Lemma 111 equation ([34J is proved. 


||R(o||p. > t/i - ^iid 

We further have the following equation 

25s 2 \ d 


V 


X 


[r\r„i 


- II 2.C1 -|- 5s\d 
A A r t l|X 3-1-r- V- 


< 


(46) 


(47) 


























































































































13 


Note that equation ( |3^ will be proved by combining equation 
(|46]l with Therefore, we only need to prove ( |47| ) in the 
following. Since both (T^ljTi) and T contain chunks in 
To, from the selection rule of Step 2. A, we have 


F. Proof of Theorem 

If Algorithmstops from the condition of ||Rp_|_i) IIf - 
then the obtained solution X = and we obtain 




> 




X-X 


< 


2 ±v 




= . If Algorithm 


which 


derives 


‘^f(%urb)\r]^P) 


(48) 


> 


condition of ||R(;_|_i) IIf 


stops from the 
R(;)||^, then obtained solution 


X = X(;). From equation (12i, we obtain 

||R(i)||^ < ||R-(i+i)||p. < Cl ||R(i)||^ + C2P. 


From this and the fact that From < 0.246, we obtain Ci < 1 and 


^ir\(rcUr6)]^(0 

$^,Ri'n = 0 , we further obtain 

[ill ^ ’ 


The right hand side term in (|49]l is further bounded by 


$ 


H Ty 

im\j%)\{Tur^)] w 

H 


^[(VcUr 6 )\(rurT] ^ 


([ 


^[fi] 


X 


( 


I - 


iTrm] 


N 


— ^S2\d 


X 


+ + ^s\dV 


|4’[r\ra]R-(0| 


= 4- 


H 

lT\Ta] 


> 


([Wd %i]x+( 

®[r\F.] ([ 4'[r\r„] ^\Ta] ]) 




x[nr„] 

xk 


~ \/l + 5s\d'r] 


where X^ is obtained by rewritten 


be ([ 4>[r\rj ^[F] ]) 


Xir\F] 


^[r\7i] ®[7i] 


X 


A 


X to 
. Note that this X 


can always be found because 71 C Ta- Furthermore | |X 


X 


. Continuing the derivation 


in (|^, 


Alli^ A 


we obtain 




[r\raiR -(0 


lT\Ta] 


AcTmin (4>^\7-^]4>[r\ra]) ||X 
— O-max (^4>[k\ra]4’[r„]) ||^a||^ — \J^+ Ss\d'<l 


Combine the results in equation (491, (50i and (|52p, we 
obtain 


^32l<i X +C1 + (53|^p> 




[r\r„: 


1^ ~ <^32^ ||x||^ ~ + &a\d'<l 


which further derives the desired equation (36 1 . 


0211 

1-Ci 

obtain 


. Further from Rp) > 


\d 


X-X 


(49) 


X-X 


< CiTj. Hence equation (141 is proved. 


max I CiT], 


R-(oIIf - 

— f], we 


Next, we prove ([T5|). Note that when min^gT- 


J+V 




|X[fc]||, 


> 


, the identified signal support T must 


be correct, i.e., T Q T- This can be proved via the contra¬ 
diction method (i.e., 3i G T, i 4 T- We obtain ||x — x|| > 

II IIf 


||X[i]||^ > max C 4 P, 


7+ZL 


(14 1 ). From T C T, we obtain 


, which violates equation 


(50) 


X - X = X,^., - $ 


t 


(*;^,x+n) 


N) =-$|t]N 


The left hand side term in equation (|49ll is further bounded by 


nri "[Tiv^m 

which further derives ( fTSj l from Lemma [T] 

G. Proof of Theorem 

First, ||R(o) II p < \/l + 5s\dP^ + 7 - Second, from (37 1 , after 
n iterations in Step 2 of Algorithm [T| the following inequality 
hold; 


|RwlU< Y^ + (Ci)" 


1 + F V ~ 


C 2 V 

1-Ci 


From (53 1 , when 7 > t :7 


C 2 V 


l-Ci 


n = Uc 


(53) 

we must obtain 


(51) 


A 


< 


^ ~ ^S2M ~ + <5s|dP- (52) 


|R(n)|I7' < 7 and hence Step 2 of Algorithm must have 
stopped after rico iterations, where rico is as given in Theorem 

H. Proof of Theorem 

Note that for the conservative M-SP, we have (i) \Ta\ < 
2S + Sc, ITaUTI < S3 = 3S +mWSe,|ro|-|ronr|), 
\TyfTi\ < 2s; (ii) equation (48 1 , (39i for Step 2A and 
Step 2C, respectively, will hold no matter whether the quality 
information Sc is correct or not. Following the proof of 
Appendix we would obtain the following iteration property 
for the conservative M-SP, 

||R-(i+i) lip. < C's ||R( 0 IIf 4" ^67 (54) 

where C 5 , Cq are modified correspondingly (compared with 
their counterpart Ci, C 2 ) and are given in Table On 
the other hand, if Algorithm [1] stops from the condition of 
p < "f, then the obtained solution X = 


||R-(/+i) 
satisfies 


X-X 


< 


'y+n 


similar to Appendix 


F - I-C 5 

— 1 ], we obtain 


(/+i)||p > IpR. 

. Furthermore, from 


If Al¬ 


io IIf’ 


gorith m^ stops from the condition of 11 R 
from (54i, we obtain ||R(;) L < . 

R(z) > v'l-<523id||x-X 

CrT]. Subsequently, equation (19i in Theorem is proved. 


X-X 


< 


Based on ( [T9] ), equation ( 20 i can be obtained similar to 
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